Processing output from bioinformatics pipelines

…BRIEF INTRO IN PROGRESS…


A tentative snakemake workflow that defines data processing rules in a DAG (directed acyclic graph) format. A detailed interactive snakemake HTML report is available here. You will be able to explore the workflow and the associated statistics. You can close the left bar (overlap) to get a more expansive display view.




Processing MOTHUR output

Import library

library(tidyverse, suppressPackageStartupMessages())

Mothur metadata

library(tidyverse, suppressPackageStartupMessages())

Mothur otutable

Mothur taxonomy

Mothur composite




Processing QIIME2 output

QIIME2 metadata

library(tidyverse, suppressPackageStartupMessages())

# QIIME2  metadata
qiime2_metadata <- read_tsv("resources/metadata/qiime2_metadata_file.tsv", 
                            show_col_types = FALSE) %>% 
  rename(sample_id="sample-id")

QIIME2 otutable

qiime2_otutable <- read_tsv("qiime2_process/transformed/feature-table.tsv", skip = 1, show_col_types = FALSE) %>%
  rename(feature='#OTU ID') %>%
  select(-starts_with('Mock')) %>% 
  mutate_at(2:ncol(.), as.numeric) %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0))

qiime2_otutable <- qiime2_otutable %>% 
  pivot_longer(-feature, names_to = "sample_id", values_to = "count") %>% 
  relocate(sample_id, .before = feature)

QIIME2 taxonomy

qiime2_taxonomy <- read_tsv("qiime2_process/transformed/taxonomy.tsv", show_col_types=FALSE) %>% 
  rename(feature="Feature ID") %>% 
  distinct() %>%
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  mutate(Taxon = str_replace_all(Taxon, "; s__$", ""),
         Taxon = str_replace_all(Taxon, "; g__$", ""),
         Taxon = str_replace_all(Taxon, "; f__$", ""),
         Taxon = str_replace_all(Taxon, "; o__$", ""),
         Taxon = str_replace_all(Taxon, "; c__$", ""),
         Taxon = str_replace_all(Taxon, "; p__$", ""),
         Taxon = str_replace_all(Taxon, "; k__$", ""),
         Taxon = str_replace_all(Taxon, "\\[|\\]", ""),
         Taxon = str_replace_all(Taxon, "\\s", "")) %>%
  dplyr::filter(!grepl("s__*", Taxon)) %>%
  dplyr::filter(grepl("g__*", Taxon)) %>% 
  select(-Confidence) %>% 
  mutate(Taxon = str_replace_all(Taxon, "\\w__", "")) %>% 
  separate(Taxon, into = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";")

QIIME2 composite

qiime2_composite <- inner_join(qiime2_metadata, qiime2_otutable, by = "sample_id") %>% 
  inner_join( ., qiime2_taxonomy, by = "feature") %>% 
  group_by(sample_id) %>% 
  mutate(rel_abund = count/sum(count)) %>% 
  ungroup() %>% 
  mutate_if(is.numeric, ~replace(., is.na(.), 0)) %>%
  relocate(count, .before = rel_abund) 

write_tsv(qiime2_composite, "qiime2_process/transformed/qiime2_composite.tsv")



Citation

Please consider citing the iMAP article[1] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.


References

[1]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4



Appendix

Project main tree

.
├── LICENSE.md
├── README.md
├── UntitledRMD.Rmd
├── config
│   ├── config.yml
│   ├── pbs
│   ├── samples.tsv
│   ├── slurm
│   └── units.tsv
├── css
│   └── styles.css
├── dags
│   ├── rulegraph.png
│   └── rulegraph.svg
├── data
│   ├── qiime2_composite.csv
│   ├── qiime2_tidy_metadata.csv
│   ├── qiime2_tidy_otutable.csv
│   └── qiime2_tidy_taxonomy.csv
├── images
│   ├── bkgd.png
│   ├── coders.png
│   ├── preprocess.png
│   ├── processdata.png
│   └── smkreport
├── index.Rmd
├── library
│   ├── apa.csl
│   ├── imap.bib
│   └── references.bib
├── report.html
├── resources
├── results
│   └── project_tree.txt
├── styles.css
└── workflow
    ├── Snakefile
    ├── envs
    ├── rules
    ├── schemas
    ├── scripts
    └── workflow.Rproj

17 directories, 26 files



Troubleshooting of FAQs

  1. Question
    • Answer
  2. Question
    • Answer